learn: prototype sandlock learn subcommand#113
Open
ghazariann wants to merge 10 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prototype implementation of
sandlock learn(#72).Summary
Adds
sandlock learn -o profile.toml -- <cmd>which runs a workload under observationand writes a sandlock profile TOML directly usable by
sandlock run -p profile.toml.on_file_accessaudit hook in supervisor onopenat/open/execve/execveatO_WRONLY/O_RDWR/O_CREAT)on_net_connectaudit hook in supervisor onconnect/sendto/sendmsgImplementation
Runs the workload under fully-permissive Landlock and intercepts syscalls via two
audit hooks added to the sandlock-core supervisor, called before dispatch on every
notification:
on_file_access(path, flags)—openat/open/execve/execveaton_net_connect(ip, port)—connect/sendto/sendmsgResults are collected and serialized to a
ProfileInputTOML.Discussion points
1. Capturing the executed binary
For
sandlock learn -- cat /etc/hostname,/usr/bin/catmust appear in the profile sosandlock runcan allow it via Landlock. The binary is loaded throughexecve, notopenat, so theon_file_accesshook alone is not enough.Adding
execveto the hook condition inhandle_notificationwas not sufficient on its own. The seccomp BPF filter decides which syscalls generate aSECCOMP_RET_USER_NOTIF, and for a basic sandbox (fs_read,fs_write)execvewas not in that list. It only enterednotif_syscallsfor heavier features (COW, chroot). The notification never reached the supervisor.The fix: a new
audit_file_accessfeature flag inSandboxFeaturesthat is true whenon_file_accessis set. Innotif_syscalls_resolvedthis addsexecve/execveatto the BPF notif list.resolve_path_for_notifalready handled execve, so no other supervisor logic changed.Is this the right place to wire this? Should
execvehave its own separate hook (on_execve) instead of being folded intoon_file_access?2. Resolving the dynamic linker
After
execvethe kernel maps the dynamic linker (e.g./lib64/ld-linux-x86-64.so.2) in kernel space before transferring control to userspace, no syscall fires, so it never appears in theon_file_accesstrace. Without it in the profile,sandlock runfails: Landlock blocks the read of the linker and the process cannot start at all.The current workaround parses the ELF
PT_INTERPsegment of the binary (ELF64 only) to recover the interpreter path. This is ad-hoc and not portable (assumes ELF64, specific header offsets, and manual endianness handling).One idea: the linker appears in
/proc/<pid>/mapsafter execve completes. The supervisor already reads/proc/<pid>/mapsfor vDSO patching (maybe_patch_vdso), so the pattern exists. But im not sure with the timing cause the execve notification fires before the kernel completes exec.I'm not that experienced in kernel development and would love guidance on the right approach here.
3. Achieving permissive Landlock during observation
The spec calls for permissive Landlock + seccomp-notify during observation. For reads this is straightforward:
.fs_read("/"). For writes,.fs_write("/")is not an option: the observation run must be non-destructive, leaving no trace on the real filesystem.The current approach pairs
.fs_read("/")with.workdir(tempdir)(COW overlay): writes are granted everywhere but redirected to a temporary overlay, so the real filesystem is never touched. Theon_file_accesshook fires before the COW redirect, so it always sees the original requested path, which is what ends up in the profile.Is COW the right way to achieve a fully permissive observation environment? Is there a lighter approach?
What still needs to be done
http.allow)--learn-syscalls)/procsamplingTests
test_learn_captures_fs_read— runscat /etc/hostname, checks/etc/hostnameappears underreadtest_learn_then_run— full round-trip: learn generates profile fromcat /etc/hostname,sandlock runuses ittest_learn_captures_fs_write— writes to a pre-existingNamedTempFile(file exists before learn runs), checks path appears underwritetest_learn_new_file_collapses_to_parent— writes to a file that does not exist; checks the profile records the parent directory, and the real file is never created (COW isolation)test_learn_then_run_write— round-trip for writes: learn captures a write, run actually creates the filetest_learn_captures_net_connect— binds a realTcpListener, runs a Python connect, checks the address appears under[network] allowtest_learn_then_run_network— round-trip for network: single listener accepts two connections, one from learn and one from runTest plan
cargo test -p sandlock-cli test_learnsandlock learn -o /tmp/profile.toml -- <cmd>sandlock run -p /tmp/profile.toml -- <cmd>